The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
时间序列分析已在网络安全,环境监测和医学信息学等不同应用中取得了巨大成功。在不同时间序列之间学习相似性是一个关键问题,因为它是下游分析(例如聚类和异常检测)的基础。由于事件触发的传感产生的时间序列的复杂时间动态,通常不清楚哪种距离度量适合相似性学习,这在各种应用中很常见,包括自动驾驶,交互式医疗保健和智能家庭自动化。本文的总体目标是开发一个无监督的学习框架,该框架能够在未标记的事件触发时间序列中学习任务感知的相似性。从机器学习有利位置,提出的框架可以利用层次多尺度序列自动编码器和高斯混合模型(GMM)的功能,以有效地学习时间序列的低维表示。最后,可以轻松地将获得的相似性度量可视化以进行解释。拟议的框架渴望提供一块垫脚石,从而产生一种系统的模型方法,以在许多事件触发的时间序列中学习相似之处。通过广泛的定性和定量实验,揭示了所提出的方法的表现大大优于最先进的方法。
translated by 谷歌翻译
后敏感经验重播(她)是一种常见的脱离政策深度加强学习算法的目标,以解决面向目标的任务;它非常适合提供仅提供稀疏奖励的机器人操纵任务。在她身上,轨迹和过渡都是均匀地对训练进行采样的。然而,并非所有的代理商的经历都同样促进训练,因此天真的统一采样可能导致学习效率低下。在本文中,我们提出了与她(DTGSH)的多样性轨迹和目标选择。首先,根据目标状态的多样性对由决定点过程(DPP)的模型进行采样进行采样。其次,通过使用K-DPP从轨迹中选择具有不同目标状态的转换。我们在模拟机器人环境中评估五个挑战机器人操纵任务的DTGSH,在那里我们表明我们的方法可以更快地学到更快,并且比所有任务的其他最先进的方法达到更高的性能。
translated by 谷歌翻译
基于图的异常检测(GAD)由于图表的强大表示能力以及图形采矿技术的最新进展而变得普遍。然而,这些GAD工具暴露了新的攻击表面,讽刺地是由于能够利用数据之间的关系的独特优势。也就是说,攻击者现在可以操纵那些关系(即图形的结构),以允许一些目标节点逃避检测。在本文中,我们通过将新型的针对性结构中毒攻击设计到奇怪的基于代表回归的GAD系统来利用这种脆弱性。特别是,我们为奇怪的攻击制定了奇怪的攻击,作为双级优化问题,在那里关键的技术挑战是有效地解决离散域中的问题。我们提出了一种基于梯度下降的新型攻击方法称为二进制层。与现有技术相比,BinarizedAttack可以更好地使用梯度信息,使其特别适用于解决组合优化问题。此外,我们通过采用它来攻击其他基于代表学习的GAD系统来调查BinarizedAtch的攻击可转换性。我们的综合实验表明,BinarizedAttack非常有效地使目标节点能够避免基于图形的异常检测工具与有限的攻击者的预算,并且在黑箱转移攻击设置中,BinarizedAtck也有效地测试,特别是可以显着改变GAD系统学习的节点嵌入式。因此,我们的研究开辟了学习新型攻击的门,以依靠图形数据的安全分析工具。
translated by 谷歌翻译
本文研究了如何改善接受深入增强学习训练的导航剂的概括性能和学习速度(DRL)。尽管DRL在无机MAP导航中表现出巨大的潜力,但在训练场景中表现良好的DRL代理在不熟悉的情况下经常表现不佳。在这项工作中,我们建议LIDAR读数的表示是代理商效果退化的关键因素,并提出了一种强大的输入预处理(IP)方法来解决此问题。由于这种方法使用适应性的参数倒数函数来预处理激光雷达读数,因此我们将此方法称为IPAPREC及其归一化版本为IPAPRECN。 IPAPREC/IPAPRECN可以突出显示重要的短距离值,并压缩激光扫描中较重要的长距离值的范围,该值很好地解决了由激光扫描的常规表示引起的问题。通过广泛的模拟和现实世界实验来验证它们的高性能。结果表明,与常规方法相比,我们的方法可以大大改善导航剂的概括性能,并大大减少训练时间。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译